Introduction to Data Analytics

Erin M. Buchanan

Last Updated: 2021-01-01

Welcome to ANLY - 500

What is Analytics? - Possible Definition 1

What is Analytics? - Possible Definition 2

Scope of Analytics?

What is Descriptive Analytics? (1)

An Example of what to Expect in Descriptive Analytics: Ex.1.1

library(datasets) 
data("sunspot.month") # special way to load embedded data
head(sunspot.month)
## [1] 58.0 62.6 70.0 55.7 85.0 83.5

An Example of what to Expect in Descriptive Analytics: Ex.1.1

str(sunspot.month)
##  Time-Series [1:3177] from 1749 to 2014: 58 62.6 70 55.7 85 83.5 94.8 66.3 75.9 75.5 ...

An Example of what to Expect in Descriptive Analytics: Ex.1.1

summary(sunspot.month)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   15.70   42.00   51.96   76.40  253.80

An Example of what to Expect in Descriptive Analytics: Ex.1.1

library(ggplot2)
sunspot.month <- as.data.frame(sunspot.month)
sunspot.month$Time <- 1:nrow(sunspot.month)
ggplot(sunspot.month, aes(x = Time, y = x)) + 
  geom_point(alpha = 0.5) + 
  ylab("Number of Sunspots") + 
  xlab("Time") +
  theme_classic()

What is Predictive Analytics?

What is Predictive Analytics?

An Example of what to Expect in Predictive Analytics: Ex.2.1

library(quantmod)
start <- as.Date(Sys.Date()-(365*5))
end <- as.Date(Sys.Date()-2)
getSymbols("AMZN", src = "yahoo", from = start, to = end)
## [1] "AMZN"
str(AMZN)
## An 'xts' object on 2016-01-04/2020-12-29 containing:
##   Data: num [1:1257, 1:6] 656 647 622 622 620 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:6] "AMZN.Open" "AMZN.High" "AMZN.Low" "AMZN.Close" ...
##   Indexed by objects of class: [Date] TZ: UTC
##   xts Attributes:  
## List of 2
##  $ src    : chr "yahoo"
##  $ updated: POSIXct[1:1], format: "2021-01-01 22:40:50"

An Example of what to Expect in Predictive Analytics: Ex.2.1

predictive_model <- lm(formula = AMZN.Close ~ AMZN.High + AMZN.Low + AMZN.Volume, 
                       data = AMZN[1:1199,])
summary(predictive_model)
## 
## Call:
## lm(formula = AMZN.Close ~ AMZN.High + AMZN.Low + AMZN.Volume, 
##     data = AMZN[1:1199, ])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -99.653  -5.406  -0.195   5.632 100.519 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.971e-01  1.636e+00    0.12    0.904    
## AMZN.High   4.799e-01  2.495e-02   19.23   <2e-16 ***
## AMZN.Low    5.210e-01  2.564e-02   20.32   <2e-16 ***
## AMZN.Volume 1.620e-08  2.714e-07    0.06    0.952    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14.93 on 1195 degrees of freedom
## Multiple R-squared:  0.9995, Adjusted R-squared:  0.9995 
## F-statistic: 7.93e+05 on 3 and 1195 DF,  p-value: < 2.2e-16

An Example of what to Expect in Predictive Analytics: Ex.2.1

par(mfrow=c(2,3))
plot(predictive_model,1)
plot(predictive_model,2)
plot(predictive_model,3)
plot(predictive_model,4)
plot(predictive_model,5)

An Example of what to Expect Analytics: Ex.2.1

n <- length(AMZN[,1])
prediction <- stats::predict(predictive_model, AMZN[1200:n,])
tail(data.frame(prediction))
##            prediction
## 2020-12-21   3198.146
## 2020-12-22   3203.073
## 2020-12-23   3199.503
## 2020-12-24   3187.688
## 2020-12-28   3238.624
## 2020-12-29   3317.538

An Example of what to Expect Analytics: Ex.2.1

plot(prediction, type = "l")

What is Prescriptive Analytics?

What does this translate into?

What is Data Analytics?

A Subcomponent of Data Analytics is Data Analysis!

A Subcomponent of Data Analytics is Data Analysis!

Other Types of Analysis

How to Correctly Apply Data Analytics?

Breaking Down the Research Process - The Initial Observation

Breaking Down the Research Process - The Initial Observation

Breaking Down the Research Process - The Initial Observation

Breaking Down the Research Process - Generating Theories

Breaking Down the Research Process - Creating a Hypothesis

Breaking Down the Research Process - Testing Theories & Hypotheses

Breaking Down the Research Process - Identifying the Variables

What’s After the Question & Identifying Variables?

What is Data?

Types of Measurements

Categorical Variables

Categorical Levels of Measurement - Binary

Categorical Levels of Measurement - Nominal

Categorical Levels of Measurement - Ordinal

Continuous Variables

Continuous Levels of Measurement - Interval

Continuous Levels of Measurement - Ratio

Consider Measurement Error:

How Valid Are My Measures?

Are My Measures Reliable?

Breaking Down the Research Process - Collecting the Data

Cross-Sectional Research

Longitudinal Research

Correlational Research

Experimental Research

Experimental Research - Methods

Experimental Research - Methods

Experimental Research - Methods

Breaking Down the Research Process - Methods to Collect the Data

Types of Variation in the Data to Consider:

Breaking Down the Research Process - Analyzing the Data

Population vs Sample

Fitting Models

Fitting Models

tapply(iris$Sepal.Length, iris$Species, mean)
##     setosa versicolor  virginica 
##      5.006      5.936      6.588

Statistical Modeling Parameters

Statistical Modeling Parameters

sample <- iris[sample(nrow(iris), 15), ]
tapply(sample$Sepal.Length, sample$Species, mean) #sample
##     setosa versicolor  virginica 
##   5.157143   5.700000   6.560000
tapply(iris$Sepal.Length, iris$Species, mean) #population
##     setosa versicolor  virginica 
##      5.006      5.936      6.588

Applicable Statistical Models

Summary